ROBUS: Fair Cache Allocation for Multi-tenant Data-parallel Workloads
نویسندگان
چکیده
Systems for processing big data—e.g., Hadoop, Spark, and massively parallel databases—need to run workloads on behalf of multiple tenants simultaneously. The abundant disk-based storage in these systems is usually complemented by a smaller, but much faster, cache. Cache is a precious resource: Tenants who get to use cache can see two orders of magnitude performance improvement. Cache is also a limited and hence shared resource: Unlike a resource like a CPU core which can be used by only one tenant at a time, a cached data item can be accessed by multiple tenants at the same time. Cache, therefore, has to be shared by a multi-tenancyaware policy across tenants, each having a unique set of priorities and workload characteristics. In this paper, we develop cache allocation strategies that speed up the overall workload while being fair to each tenant. We build a novel fairness model targeted at the shared resource setting that incorporates not only the more standard concepts of Pareto-efficiency and sharing incentive, but also define envy freeness via the notion of core from cooperative game theory. Our framework, ROBUS, uses randomization over small time batches, and we develop a proportionally fair allocation mechanism that satisfies the core property in expectation. We show that this algorithm and related fair algorithms can be approximated to arbitrary precision in polynomial time. We evaluate these algorithms on a ROBUS prototype implemented on Spark with RDD store used as cache. Our evaluation on a synthetically generated industry-standard workload shows that our algorithms provide a speedup close to performance optimal algorithms while guaranteeing fairness across tenants.
منابع مشابه
Performance Isolation and Fairness for Multi-Tenant Cloud Storage
Shared storage services enjoy wide adoption in commercial clouds. But most systems today provide weak performance isolation and fairness between tenants, if at all. Misbehaving or high-demand tenants can overload the shared service and disrupt other well-behaved tenants, leading to unpredictable performance and violating SLAs. This paper presents Pisces, a system for achieving datacenter-wide p...
متن کاملElCached: Elastic Multi-Level Key-Value Cache
Abstract Today’s cloud service providers (CSPs) use in-memory caching engines to improve application performance and server revenue. However, these caching engines exhibit poor scaling, mainly because of high DRAM cost and energy consumption. On the other hand, the increasing use of multi-tenancy requires effective and optimal resource provisioning. In this paper, we introduce ElCached, a multi...
متن کاملPerformance Interference of Multi-tenant, Big Data Frameworks in Resource Constrained Private Clouds
In this paper, we investigate and characterize the behavior of “big” and “fast” data analysis frameworks, in multitenant, shared settings for which computing resources (CPU and memory) are limited. Such settings and frameworks are frequently employed in both public and private cloud deployments. Resource constraints stem from both physical limitations (private clouds) and what the user is willi...
متن کاملAdvanced Cache Techniques for SLA-Driven Multi-Tenant Application on PaaS
Multi-tenant application is one of the main characteristics of cloud computing. Today, most of the application uses cache service for getting faster access and low response time. Currently in multi-tenant cloud applications data are often evicted mistakenly by cache service, which is managed by existing algorithms such as LRU. Also, security mechanisms are implemented to avoid data breach when ...
متن کاملEase.ml: Towards Multi-tenant Resource Sharing for Machine Learning Workloads
We present ease.ml, a declarative machine learning service platform we built to support more than ten research groups outside the computer science departments at ETH Zurich for their machine learning needs. With ease.ml, a user defines the high-level schema of a machine learning application and submits the task via a Web interface. The system automatically deals with the rest, such as model sel...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1504.06736 شماره
صفحات -
تاریخ انتشار 2015